Deep Reinforcement Learning for Workload Prediction in Federated Cloud Environments

The Federated Cloud Computing (FCC) paradigm provides scalability advantages to Cloud Service Providers (CSP) in preserving their Service Level Agreement (SLA) as opposed to single Data Centers (DC). However, existing research has primarily focused on Virtual Machine (VM) placement, with less emphasis on energy efficiency and SLA adherence. In this paper, we propose a novel solution, Federated Cloud Workload Prediction with Deep Q-Learning (FEDQWP). Our solution addresses the complex VM placement problem, energy efficiency, and SLA preservation, making it comprehensive and beneficial for CSPs. By leveraging the capabilities of deep learning, our FEDQWP model extracts underlying patterns and optimizes resource allocation. Real-world workloads are extensively evaluated to demonstrate the efficacy of our approach compared to existing solutions. The results show that our DQL model outperforms other algorithms in terms of CPU utilization, migration time, finished tasks, energy consumption, and SLA violations. Specifically, our QLearning model achieves efficient CPU utilization with a median value of 29.02, completes migrations in an average of 0.31 units, finishes an average of 699 tasks, consumes the least energy with an average of 1.85 kWh, and exhibits the lowest number of SLA violations with an average of 0.03 violations proportionally. These quantitative results highlight the superiority of our proposed method in optimizing performance in FCC environments.


Introduction
The Cloud computing paradigm has revolutionized the way the IT industry operates due to its ease of use, scalability, and the ability to offer flexible services such as Software-asa-Service (SaaS), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS) [1,2]. This presents a wide range of options for organizations. However, with the wide-scale adoption of this paradigm, the challenge of keeping up with user demand is of high priority. When the user demands outpace the available resources, the organization loses business due to violation of the Service Level Agreement (SLA) [3].
The Federated Cloud Computing (FCC) paradigm introduces a novel solution to this challenge by enabling resource sharing among Cloud Service Providers (CSPs). When a certain CSP is unable to accommodate the resource requirements of a user, it offloads the work to another federation member CSP with available resources to process it [4,5]. This way, FCC allows CSPs to accommodate more users or handle larger workloads than they would be able to with a single Data Center (DC).
However, this federated approach introduces a complex problem of efficiently allocating resources, considering all the DCs within the federation. This is due to the fact that DCs in an FCC environment can vary significantly in terms of hardware configurations, network 1.
How can DQL be implemented as an optimal VM scheduling and energy-efficient strategy while respecting SLA agreements in an FCC environment? 2.
What is the overall impact of using DL in solving complicated multi-dimensional problems for system performance? 3.
How does the proposed novel solution perform compared to existing solutions in terms of energy efficiency, SLA adherence, and cost-effectiveness? 4.
How can the proposed solution be extended to handle dynamic workloads and changing resource availability in FCC environments? 5.
What are the associated challenges and trade-offs of implementing our solution?
The main contributions of this article include: • A novel solution, FEDQWP, to the problem of optimizing VM placement and energy efficiency while preserving SLA in FCC environments using DQL. • A comprehensive evaluation of the proposed solution using real-world workloads, demonstrating its effectiveness compared to existing solutions in the domain. • A deeper understanding of the benefits and challenges of using DQL in optimizing complex multi-dimensional problems. • A framework for handling dynamic workloads and changing resource availability in FCC environments.
It is expected that the proposed solution provides a significant contribution to the field of cloud computing by addressing key questions and challenges and providing new directions for potential research. The subsequent sections of this article are structured as follows: Section 2 presents a comparative analysis of the existing literature. Section 3 discusses the architecture of the federation and the operations of FEDQWP. Section 4 explains the findings of the analysis. Section 5 details the interpretation of the results, and the paper concludes in Section 6. Section 7 discusses possible future directions.

Related Works
Much research has been conducted in the field of predicting and allocating resources in the Cloud Computing environment. A Deep Belief Network (DBN) model proposed by Qiu et al. [10] consists of a multi-layer architecture that is capable of extracting highlevel features from the historic load data and the logical regression layer using Restricted Boltzmann Machines (RBM). The designed model can be used to forecast the workload for a single VM or for multiple VMs. The input layer at the bottom is where the Central Processing Unit (CPU) utilization data are fed, and the top layer provides the predicted output. While the accuracy of the CPU utilization prediction is improved, the resources needed for training are high. Moreover, the authors have not incorporated the prediction with other parts of the VM consolidation algorithm.
Beloglazov et al. [14] focused on analyzing past resource usage data using adaptive heuristics for VM consolidation. They employed variations of the Local Regression algorithms to predict the CPU utilization of hosts. This in combination with a Minimum Migration Time (MMT) approach has been put forth as a power-aware model which takes into account the level of SLA violation and the number of VM transfers. While the comparison shows that it provides better results, it does not explain the high variations, where in some instances, high energy consumption results in no SLA violations and low energy consumption results in high SLA violations.
Kumar et al. [15] have combined the Long Short Term Memory (LSTM) model and Recurrent Neural Network (RNN) models to solve the issue of dynamic resource scaling in a cloud environment. They put forth the argument that the best way to predict resource consumption is to learn the patterns from historical data. While RNNs have the ability to extract that information, when time series problems are concerned, they do not perform well over longer durations. Combining this with LSTMs balances this downside as LSTMs preserve information over longer periods. The predicted output from the proposed design is then fed into the Resource Manager (RM) that measures the present condition of the DC, which facilitates it to scale up or scale down the resources accordingly. Even though it gives higher accuracy, the amount of data for it to be effective and the computational resources for training might be prohibitive for it to be effective.
Zhu et al. [16] propose a novel solution involving the attention mechanism along with the LSTM encoder-decoder pairings. Initially, the scheme extracts the sequential features of the past utilization data using the encoder network. For the next phase, the attention mechanism is employed in the decoder network for batch workload prediction. The context vector takes in the input workload for the encoding phase and the decoder recursively unravels the context vector providing an output of the predicted sequence. This method is effective in mitigating error amplification and improving accuracy for longer-duration prediction. However, more resources are needed for processing a larger model, as more epochs are needed to converge. This is a result of it being costlier than training singlelayer models.
A clustering-based workload prediction method is proposed by Gao et al. [17]. This is so that all workload patterns are captured covering heterogeneous tasks. Two kinds of clustering techniques were used: Prototype-Based Clustering (PBC) and Density-based clustering (DBC). Furthermore, it revolves around an approach named m-gap prediction that buffers a gap of specific (m) points of time between input data and the forecasted data points. This allows sufficient time for task scheduling. This, in combination with clustering, pools similar workload patterns into classes which are stored as a model which in turn is used for the final prediction. Even though the prediction accuracy is good, this approach does not scale well to larger datasets and the user has to explicitly specify the number of clusters.
The Savitzky-Golay and Wavelet-supported Stochastic Configuration Networks (SGW-SCN) framework is proposed by Bi et al. [18] as an integrated machine-learning solution for DCs. It utilizes the SGW filter used to eliminate the noise while preserving the width and peaks of the signal. It also uses the Haar wavelet method to extract the patterns from time series data. Furthermore, it performs well with various scales of resolution. The SCN is a supervised learning method using the randomized learning model for higher accuracy. Thus, it increases the learning speed of the model. However, the tradeoff is a higher resource consumption for the training of the model.
All of the aforementioned works attempt to deal with workloads of high variance and high dimensional space [19]. This forces the model to grapple with more complex data to come up with a solution. FEDQWP builds on these works and is distinguished by considering both VM placement and energy efficiency while preserving SLA agreements using DQL in FCC environments. The draw of Q Learning is that they are not as expensive computationally to train and deploy as opposed to the above mentioned instances [20]. No existing solutions comprehensively tackle the challenges mentioned previously, to the best of our knowledge. FEDQWP represents a significant improvement over existing solutions in the domain, demonstrating the potential of DQL in solving complex multi-dimensional problems in FCC environments.

Proposed Cloud Federation Architecture
Even with so many organizations investing in Cloud Computing technology, there is a finite capacity when it comes to providing resources. To move beyond this limitation, the best approach would be to collaborate and share resources. The applications offered in a CSP should be virtually contained within a VM to be migrated between different CSPs. This pooling of resources is a win-win scenario for all parties involved-the user receives uninterrupted service due to the processing request being offloaded, the CSP associated with the user does not violate the SLA and does not over-extend its capacity, and the CSP handling the offloaded request is able to utilize idle computing power to make extra income.
As shown in Figure 1, there will be multiple CSPs. In our case, we will be working with a federation of 3 CSPs. Each CSP has an RM unit that hosts the components of the FEDQWP including the Deep Reinforcement Learning (DRL) agents along with the resource scaling functionalities. The Resource Collector (RC) interfaces with the manager to collect the workload history and store it for future use. We assume that: • The CSPs are de-centralized and operate on a peer-to-peer basis. • The RM functions as the resource broker for each CSP. • Each resource broker has its ledger of available resources.
The user request is packaged into VMs which will be transferred among the various CSPs in the federation. The vital part of the architecture is the RM. As shown in Figure 2, the Resource Manager is connected to the hosts numbered from m 1 to m M . Each of the hosts executes the applications inside specific VMs which are isolated from each other and denoted by a 1 to a A . Processing requests are sent by consumers of the CSP. Each request occupies one VM container which occupies space within a host along with other similar VM containers. The RM keeps track of the activities of the consumer, the time taken to process the request, and the power consumption of the VMs, and determines how the VM containers are moved within the CSPs.

Q Learning Algorithm Implementation
Our solution to the optimal migration problem is a DQL algorithm to learn the best policy for predicting the workload and scheduling the VM accordingly. The model employs an agent which regularly interacts with and samples the condition of the environment. The agent is aware of the nature of the workload and the energy consumption parameters and makes decisions accordingly for VM placement. Depending on the feedback based on the action taken, the agent is rewarded or penalized accordingly on how the SLA compliance and energy consumption are affected.
The DRL model is trained using a combination of deep Neural Networks (NN) and DRL algorithms. We use a deep NN to approximate the optimal policy, and we use the Q-learning algorithm to update the Q-values of the state-action pairs. The learning agents learns the optimal decisions by interacting with the environment on a "trial-and-error" based approach. This rule means the DRL agents make a trade-off between known decision exploitation and new decision exploration to achieve the optimal policy.
As is shown in Figure 3, our architecture relies on: • S: The finite set S includes the possible states of the environment. The state space includes various features extracted from the collected data, such as the current workload, resource utilization, and energy consumption. • A: The finite set S(s) is the set of actions available in state s. The action space includes the set of possible actions that can be taken, such as adding or removing VMs, migrating VMs between data centers, and adjusting resource allocation. • π π π: The policy that maps from S and action A. π (s, a) denotes the probability of acting under state s. • r: The reward function is used to evaluate the goodness of an action taken by the agent. In our case, the reward function is designed to minimize energy consumption while ensuring SLA compliance. The reward function is defined as a weighted sum of delay, power consumption and migration cost, where the weights are determined based on the importance of each objective.
The SLA is determined based primarily on the response time. The response times of migrations are sorted in ascending order; then, calculating the value at the 95th percentile, the algorithm determines the threshold response time that 95% of the requests should meet to satisfy the SLA. Any response time exceeding this 95th percentile value would be considered a violation of the SLA. The problem of training is tackled by a two-pronged solution. There are two NNs for training and prediction, as illustrated in Figure 4. The input and output layers have 5 hidden layers between them, with 2500 to 5200 nodes for each layer. The ReLu activation function is used to introduce nonlinearity to allow the agent to develop complex representations. The choice of ReLu over other functions such as Sigmoid is due to neurons not being activated at the same time by the former. As a result, it reaches convergence faster than other activation functions [21]. This in combination with the DQL algorithm is utilized for quick decision-making, as workloads have a tendency to fluctuate and decisions have to be taken in real-time [19].
The state of the system is dependent on the migration cost, power consumption, and the delay involved in resolving VMs. The delay is related to the location of the ith host at a given time and the resource requirement of the ith VM at a given time. This is denoted by the following equation: In Equation (1), S(φ) is the system state of delay during the φth time slice. φ is the serial number of the φth time slice, where S x,y (φ) represents the state of delay of the ith node and (ith VM container).
To represent the main metrics used, x total denotes the total delay, y total is for the total power consumption and z total is the total computational cost of migration by aggregating the individual delay, power consumption, and migration cost values during a unit of time (slice)-φ, where a slice T φ is enumerated as (φ = 0,. . . , k). It is denoted as: In Equation (2), x net represents the delay associated with the network between the user and the relevant nodes assigned during a certain time slice. x comp represents the delay in the computation of application tasks. They are confined to the ith VM container.
In Equation (3), y total i (φ) represents the estimated power consumption of the ith node during time slice φ.
In Equation (4), z total i (φ) represents the cost of migrating the ith VM container during time slice φ.
We focus on minimizing the total cost over time; thus, the reward during a specific time slice is defined as: In Equation (5), R φ is the reward given to the agent during the φth time slice. ω 1 is the weight assigned to the total delay in relation to the impact on the total cost. ω 2 represents the weight assigned to the power consumption in relation to the total cost. It is added to the weight of the cost of migration ω 3 .

DQL Algorithm
The following algorithms provide a step-by-step process for initializing and optimizing the placement of VM containers using our algorithm:

Step 1-VM Container Initialization
In Algorithm 1, first, we initialize new_container_infos to hold information about newly generated VM containers from the workload. Then, we deploy the newly generated VM containers onto the system. The VM containers are created and assigned unique VM container IDs to identify them. The decision for the optimal placement of VM containers is based on the DQL algorithm. Based on the decision, the VM containers migrated for resource optimization. Next, the workload allocation is updated based on the feedback from the simulation. Finally, the current state of the system, including the data center configuration, workload information, scheduler settings, environment variables, and any relevant statistics, are returned.

Step 2-Agent Initialization
In Algorithm 2, input_size is calculated as the sum of two terms. The first term, 2 * num_VM containers, represents the number of elements in the delay matrix. It is multiplied by 2 because each element in the delay matrix may have two possible values (e.g., minimum delay and maximum delay). The second term, num_hosts * num_VM containers, represents the number of VM containers multiplied by the number of hosts. This part accounts for the states related to mapping VM containers to hosts. Overall, input_size represents the total number of possible states in the environment. Similarly, output_size is determined as the product of the number of hosts (num_hosts) and the number of VM containers (num_VM containers). This represents the number of possible actions the agent can take, which corresponds to the different ways of mapping VM containers to hosts. Finally, the code creates an instance of the Agent class using the calculated input_size, output_size, and other hyperparameters such as learning_rate, gamma, epsilon, epsilon_decay, min_epsilon, replay_memory_size, and batch_size|. This agent will be responsible for learning and making decisions in the cloud environment using the deep Q-learning algorithm.

.3. Step 3-VM Placement
Algorithm 3 starts by initializing the decision array to store the VM placement decisions. It then iterates over each VM container in the VM container list. If the VM container is None, indicating no VM to place, it moves to the next iteration. Otherwise, the algorithm assigns the VM container ID to variable c. If c is equal to the selected VM container based on the agent's action, the algorithm appends the VM container ID and the selected host to the decision array. If c is different from the selected VM container, a random host index is chosen, and the VM container ID and the randomly selected host are appended to the decision array. After making the placement decision, the algorithm takes a step in the simulator, passing the selected host and selected VM container values to obtain the next state, reward, and done flag. These values are then appended to the replay memory. The total reward is incremented by the received reward, and the state is updated with the next state for the next iteration. Finally, the algorithm calculates the Q values and expected Q values, updates the loss function, adjusts the epsilon value, increments the episode counter, and outputs the episode count, total reward, and epsilon value.

Step 4-Agent Movement
Algorithm 4 outlines the procedure for agent movement using DQL. It begins by specifying the input parameters for the agent, including the input size, output size, learning rate, discount factor (gamma), exploration rate (epsilon), epsilon decay rate, minimum epsilon value, replay memory size, and batch size.
The number of episodes is set to a predefined value. For each episode, the algorithm initializes the state to its initial value and sets the done flag to indicate that the episode is ongoing. The total reward is initialized to track the cumulative reward obtained in the episode. The algorithm enters a loop that iterates until the episode is done. Within this loop, the agent selects an action based on an exploration-exploitation trade-off: if the exploration rate (epsilon) is greater than a random value, the agent selects the action with the highest Q value based on the current state; otherwise, it chooses a random action. The algorithm then interacts with the environment, obtaining the next state, reward, and done flag. These values are stored in the replay memory. The total reward is updated by adding the reward obtained in the current step. The state is updated with the next state. The Q values for the current state are obtained from the prediction model, while the target Q values for the next state are obtained from the target model. The expected Q values are computed as the aggregate of the reward, the discount factor (gamma), and the target Q values. The loss function, calculated using the Mean Squared Error [22,23], measures the discrepancy between the predicted Q values and the expected Q values, allowing the agent to adjust its predictions. The exploration rate (epsilon) is decreased according to a decay rate or until it reaches the minimum value. The episode count, total reward, and epsilon value are returned as the output. The algorithm repeats until all episodes are completed, facilitating the learning and improvement of the agent's decision-making process.

Performance Metrics
We evaluate the efficacy of our proposed solution using the following metrics.

Total Delay Calculation
Algorithm 5 aims to calculate the total delay for a given set of VM containers in a cloud environment. It begins by initializing the variable total to 0, which will accumulate the total delay. It then iterates over each VM container in the VM container list. If a VM container is None, indicating an empty slot, the algorithm skips it and continues to the next iteration. For each non-empty VM container, the algorithm retrieves the relevant host and VM container information, including the associated network latency. It also calculates the computation delay of the VM container by considering the remaining steps to be executed and the apparent Instructions Per Second (IPS). The term "Apparent" is used to indicate that the IPS value is adjusted or estimated based on the observed performance characteristics of the system. This adjustment helps to provide a more realistic measure of the effective processing capacity of the system when calculating the computation delay for a VM container. The total delay of the VM container is obtained by summing the network delay and the computation delay. This total delay is then added to the total variable. Finally, after iterating through all the VM containers, the algorithm returns the accumulated value of the total, representing the overall delay experienced by the non-empty VM containers in the given cloud environment.

Total Power Consumption Calculation
Algorithm 6 calculates the total power consumption for a list of hosts in a cloud environment. The SPECpower_ssj2008 [24] was used as the benchmark to represent CPU load and corresponding power utilization. It uses the left_range and right_range variables to represent the power consumption values at different utilization levels. left_range corresponds to the lower utilization level (index) in the list of power utilization, while right_range represents the higher utilization level (index + 1). The alpha variable determines the proportion of power consumption between left_range and right_range based on the fractional part of the CPU utilization divided by 10. By performing linear interpolation with alpha, the algorithm estimates the power consumption for the given CPU utilization, ensuring a smooth transition between utilization levels. The power consumption estimation is returned as the result of the power function, calculated by multiplying alpha with right_range and (1 − alpha) with left_range and then summing the two values.

Migration Cost Calculation
Algorithm 7 is a function that calculates the migration cost in a system. The algorithm takes input parameters decisions, totalBW, and env. It initializes variables total and migrationTime to 0, and routerBwToEach_Host is computed. The algorithm iterates over each container_id and corresponding new_host_id from decisions. For each VM container, it retrieves the VM container object and checks if it exists. If not, it returns 0 as the migration cost. If migration is required, it computes the migration time based on the VM container size and allocated bandwidth. Additionally, it adds the latency difference if the new host ID is valid. The migration time is added to the total, representing the cumulative migration cost. In the end, the algorithm returns the total migration cost. Finally, the reward function is calculated as illustrated in Equation (6): reward(env, decision) = −(total_delay(env) + total_power_consumption(env) + total_migration_cost(env, decision)) (6) Here, the reward function takes the environment state and decision as input. It calculates the total delay, total power consumption, and total migration cost in the given environment using corresponding functions. The resulting values are then summed together and negated to represent the negative reward. The initial weights for delay and power consumption are set to 1 to maintain a balanced consideration between these metrics. A value of 1 means that both metrics are considered equally important initially. Beyond that point, the weight values are determined by the action of the agent and the state space, which is a result of the number of containers and hosts in the system. The state space accounts for the information needed to represent the entire state of the system. The reward scaling function determines the subsequent actions, and it is configured to have a reward range from −1 to 1. Here, −1 represents a negative reward (penalty), and 1 represents a positive reward (reinforcement). This allows the agent to maximize rewards and improve its performance over time.
The overall migration cost is calculated for a list of container migration decisions made by the agent during a specific time slice. It sums up the individual migration costs for each container, considering the container's size, migration time, and CPU utilization. This cost is essential for the agent to optimize its actions, considering the trade-offs between the benefits of moving containers to more suitable hosts and the cost associated with the migration process.

Experimental Setup
We conducted experiments using a simulation framework that emulates the FCC environment. The simulation framework uses the real-world workload and energy consumption data as benchmarks. We compare the performance of our proposed solution with existing solutions in the literature in the subsequent section.
As shown in Algorithm 8, we first initialize an empty list to store the generated VM containers. It then iterates over a range determined by a Gaussian distribution of the number of VM containers, ensuring a minimum of 1 VM container is generated. For each iteration, it assigns a CreationID and selects a random index. To create a realistic workload representation, we generated new VM containers based on real-world workload. The Azure 2017 workload dataset [25,26] was used as a benchmark for the simulation framework. This dataset consists of CSV files that provide information about CPU utilization in Megahertz and workload characteristics in the Azure cloud environment, recorded at intervals of 5 min for a total of 30 days. These CSV files are read as DataFrames.
SLA values are generated based on a Gaussian distribution. IPS and other metrics are calculated using the DataFrame values. Instances of different classes are created with the calculated values. The tuples containing the VM container details are appended to workloadlist and creation_id is incremented. The generated VM containers are added to generated_VM_containers. Finally, the method returns the result of a method called deployed_VM_containers. Overall, this code generates new VM containers for the workload based on various parameters and metrics, creating a diverse workload representation. The purpose of using the Gaussian distribution in the For loop is to introduce randomness and variability in the generated VM containers [27]. The number of iterations in the For loop, which represents the number of VM containers to be generated, is determined by sampling from a Gaussian distribution. In the code, the range() function is used with a parameter of max(1, int(gauss(mean, sigma))). The gauss() function is from the random module and generates random numbers following a Gaussian distribution (also known as a normal distribution). By using the Gaussian distribution, the number of VM containers generated will tend to cluster around the mean value but with some level of randomness determined by the sigma value. The result is that the number of VM containers will vary within a range, creating a more realistic and diverse workload [28]. It lends to the idea that it is representative of other cloud environments as well. This approach allows for generating workload files with different numbers of VM containers each time the method is called, adding variability to the workload and potentially simulating real-world scenarios where the number of VM containers fluctuates.
For the DQL algorithm, the Gymnasium library was utilized [29]. We adopted a learning rate of 0.001. The learning rate determines the step size during weight updates in the neural network. A smaller learning rate helps the network make more gradual adjustments, which is crucial for achieving stable learning in complex environments. The discount factor (gamma) was set to 0.99. This parameter controls the balance between immediate and future rewards. By giving future rewards a non-zero weight, the agent is encouraged to consider the long-term consequences of its actions, promoting more strategic decision-making.
To handle the exploration-exploitation trade-off, we implemented an epsilon-greedy policy [30]. Initially, the exploration rate (epsilon) was set to 1.0, meaning the agent mostly explored the environment at the beginning of training. We chose a high initial epsilon to encourage sufficient exploration and avoid prematurely converging to suboptimal policies. Over time, the epsilon value was set to decay with a factor of 0.995, gradually reducing the agent's exploration rate. This decay allows the agent to shift its focus towards exploiting the knowledge it acquired during training, ultimately improving the quality of its decisions.
For the minimum exploration rate (minimum epsilon), we set it to 0.01. This ensures that the agent maintains some level of exploration even after extensive training, preventing it from becoming overly deterministic and potentially missing better solutions. To facilitate efficient learning and sample diversity, we used a replay memory size of 10,000. The replay memory stores past experiences, enabling the agent to learn from a random selection of experiences rather than relying solely on the most recent ones. A large replay memory size ensures that the agent can better generalize its experiences and avoid potential biases. The batch size for updating the neural network was set to 32. A larger batch size helps the agent learn from multiple experiences simultaneously, leading to more stable and efficient learning compared to using just one experience at a time.
Overall, these parameter settings were carefully chosen to strike a balance between exploration and exploitation, optimize learning performance, and ensure convergence of the DQL algorithm.
The experiments were run on a system with an Intel i7 processor, 16 GB RAM along with Python(3.
HGP is an auto-tuning algorithm that leverages gaussian processes to find optimal configurations for stream processing systems within a limited experimental budget. It captures posterior distributions of the configuration spaces using Bayesian Optimization. IQR-MMT is an energy-efficient approach for resource management in cloud computing. It uses the Cuckoo Optimization Algorithm (COA) to detect over-utilized hosts and employs the MMT policy to migrate VMs to achieve better resource utilization and lower energy consumption. MAD-MMT focuses on efficiently managing cloud resources through effective VM selection policies and hotspot detection mechanisms. The introduced policies, Median Migration Time, and Maximum Utilization aim to minimize energy consumption, service level agreement violations, and the number of migrations. RLR-MMT aims to optimize resource utilization and energy efficiency in cloud data centers. It proposes a Logistic Regression-based host overloading prediction technique for VM consolidation by migrating or consolidating VMs to prevent host overloading. GA is an approach for energy reduction in cloud data centers. It tackles the NP-Hard problem of container consolidation using heuristic and metaheuristic algorithms. The proposed Energy Efficient Genetic Algorithm (EEGA) attempts to optimize energy consumption and resource utilization.
As shown in Figures 5 and 6, QLearning demonstrates efficient CPU utilization with a median value of 29.02, only second to HGP with 31.22. The remaining models, including IQR, MAD, RLR, and GA, have lower mean CPU usage percentages, ranging from 23.22 to 28.93. The Median value is taken into account, as real cloud workloads tend to have asymmetric distributions with frequent spikes [44]. If the mean is used, it would be significantly impacted by the outliers. Thus, median is robust as it less sensitive to extreme values in the context of CPU utilization [45]. With the violin plot, the central tendency of CPU utilization across different model workloads is illustrated while considering the spread and shape of the distributions. Among the evaluated models in Figure 11, QLearning exhibits the lowest energy consumption, with an average of 1.85 kWh and a negligible standard deviation. This indicates that QLearning consumes the least amount of energy compared to the other models. It is worth noting that the other models, such as HGP, IQR, MAD, RLR, and GA, also have relatively similar energy consumption levels to QLearning, ranging from 1.93 kWh to 2.00 kWh. In summary, the QLearning model consistently performs better than the other evaluated models across various metrics. It consumes the least amount of energy, achieves a high number of finished tasks, requires minimal migration time, experiences fewer SLA violations, and efficiently utilizes CPU resources. These findings highlight the superi-ority of QLearning in FCC environment scenarios and make it an attractive choice for optimizing performance.

Discussion
The results of the simulations are illustrated in Table 1  Energy Consumption: The QLearning model consumes approximately 5.5% less energy compared to the average energy consumption of the other models. This highlights QLearning's efficiency in utilizing resources, leading to reduced energy costs and environmental impact.
Sum of Finished Tasks: QLearning achieves approximately 3.2% more finished tasks than the average number of finished tasks of the other models. This indicates that QLearning demonstrates a slightly higher capability to successfully complete tasks, showcasing its effectiveness in task management.
Migration Time: QLearning completes migrations approximately 50.5% faster than the average migration time of the other models. This substantial improvement in migration time emphasizes QLearning's ability to swiftly adapt and transfer workloads, resulting in minimized system downtime and improved overall performance.
Sum of SLA Violations: QLearning exhibits exceptional performance by having approximately 51.6% fewer SLA violations compared to the average number of violations observed in the other models. This highlights QLearning's reliability and adherence to service level agreements, ensuring better service quality and customer satisfaction.
Median CPU Usage Percentage: QLearning utilizes CPU resources approximately 8.1% more efficiently than the average CPU usage percentage of the other models. This suggests that QLearning optimizes the allocation and distribution of computational resources, leading to enhanced performance and resource utilization.
While QLearning may not surpass the other models in terms of the number of CPU usage efficiency, it excels in significant aspects such as energy consumption, migration time, SLA violations, and finished tasks. These strengths highlight QLearning's potential to improve resource management, system stability, and overall efficiency in FCC environments. The significance of these findings lies in the fact that it promotes a solution that optimizes the VM placement and energy efficiency while preserving SLA agreement in FCC environments.
FCC environments are highly complex due to the involvement of multiple DCs and the need for efficient workload management across them. Traditional methods often struggle to handle the complexity and dynamic nature of these environments. FEDQWP handles complex and multi-dimensional scenarios by using explorative techniques as well as neural networks. This makes it well-suited for addressing the challenges posed by FCC environments. One of the key strengths of FEDQWP is its ability to learn from past experiences through a trial-and-error approach. By continuously interacting with the environment and optimizing its actions based on rewards and penalties, the DQL model gradually improves its decision-making capabilities, leading to better performance over time. Other methods rely on extensive training on historical data, while the explorative nature of our solution can make fast decisions without relying as much on historical data. Another advantage is the consideration of multiple important metrics, including energy consumption, finished tasks, migration time, SLA violations, and CPU usage. These factors collectively make the DQL model a comprehensive and effective solution for workload prediction in FCC environments.
While the proposed FEDQWP method shows promising performance, there are a few potential limitations to consider. We evaluate FEDQWP's performance using real-world workloads, but the generalizability of the model to a wide range of FCC environments should be considered. The performance of the DQL model may vary when applied to different data center architectures, workload characteristics, and resource allocation policies. Changes in workload patterns or infrastructure dynamics over time may require continuous retraining of the model to maintain its performance. This might involve having to tune various hyperparameters, such as the learning rate, discount factor, and exploration rate, to achieve optimal results.

Conclusions
In this study, we proposed FEDQWP, a novel approach to address the challenges of energy and SLA-aware workload prediction and VM scheduling in geographically distributed FCC environments using DQL. Our proposed approach demonstrated significant improvements over the existing methods in terms of energy efficiency and SLA compliance. The use of DQL in this study opens up exciting possibilities for the optimization of FCC resources in the future. We can thus continue to improve the efficiency, sustainability, and scalability of FCC by leveraging the power of ML and supporting the growth of the futuristic economy.

Future Work
Overall, our proposed approach provides a comprehensive and effective solution to the challenges faced by CSPs in geographically distributed FCC environments. Future research could explore the application of our approach to different types of workloads, such as real-time applications or big data processing, and investigate the potential of combining our approach with other optimization techniques to further improve its performance. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/Azure/AzurePublicDataset/blob/master/AzurePublicDatasetV1 .md (AzurePublicDatasetV1, accessed on 6 February 2023).

Conflicts of Interest:
The authors declare no conflict of interest.